Layout Group Extraction from Web Content for Effective Adaptation

نویسندگان

  • Kentarou Fukuda
  • Hironobu Takagi
  • Junji Maeda
  • Chieko Asakawa
  • Kentarou FUKUDA
  • Hironobu TAKAGI
  • Junji MAEDA
  • Chieko ASAKAWA
چکیده

These days, people access the Web by using various devices and methods, such as PDAs, cellular phones, and voice-based browsers. However, most Web content is designed for desktop computers. Therefore, alreadyexisting Web content should be transcoded to be suitable for each access device and method. For this purpose, some annotation-based transcoding systems have been developed. An annotation is additional information of Web content, and effective adaptation can be achieved by using it. One of the most difficult problems of annotation is the cost of annotating Web content. Many popular sites, such as news sites, have a large number of Web pages and add new content continually. Hence, it is almost impossible to annotate all of the content in these sites. To solve this problem, we introduce a method to extract common layouts from Web pages. We focus on the structure and characteristics of particular HTML tags that affect the layout of Web pages. Our method calculates the distance between Web pages using this method. When the distance is below the threshold, these pages can be considered as the same layout pages. By using this method, a certain annotation can be applied to any Web pages that have the same layout. Therefore, the cost of adaptation will be reduced. key words: HTML, layout group, transcoding, adaptation, annotation

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extracting Content Structure for Web Pages Based on Visual Representation

A new web content structure based on visual representation is proposed in this paper. Many web applications such as information retrieval, information extraction and automatic page adaptation can benefit from this structure. This paper presents an automatic top-down, tag-tree independent approach to detect web content structure. It simulates how a user understands web layout structure based on ...

متن کامل

VIPS: A VIsion based Page Segmentation Algorithm

A new web content structure analysis based on visual representation is proposed in this paper. Many web applications such as information retrieval, information extraction and automatic page adaptation can benefit from this structure. This paper presents an automatic top-down, tag-tree independent approach to detect web content structure. It simulates how a user understands web layout structure ...

متن کامل

A Layout Based Detachment Approach for Extracting Content from Webpages

Corresponding Author: Deepa Chandran Department of Information Technology, SNR Sons College, Coimbatore, India E-mail:[email protected] Abstract: Enormous amount of useful information presented in Internet is usually formatted for the web users. But it is a really complex task to extract the relevant data from various web sources. Recently, various approaches for the extraction of data fro...

متن کامل

Collecting and Organizing Web Content

To collect and organize Web content today a user must make bookmarks, print whole webpages, or copy and paste pieces of webpages into a document. We present a framework for assisting the user in managing personal collections of Web content. The user interactively selects the webpage elements of interest, and the system builds an extraction pattern for those elements that is used to automaticall...

متن کامل

Automatic E-Comic Content Adaptation

Reading digital comic on mobile phone is demanding now. Instead of create a new mobile comic contents, adaptation of the existing digital comic web portal is valuable. In this paper, we proposed an automatic e-comic mobile content adaptation method for automatically create mobile comic content from existing digital comic website portal. Automatic e-comic content adaptation is based on our comic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002